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DETAILED ACTION 
Response to Amendment 

1 . This Office Action is in response to applicant's communication filed 18 May 2010, 
in response to the Office Action mailed 18 February 2010. The applicant's remarks and 
any amendments to the claims or specification were considered, with the results that 
follow. 

Continued Examination Under 37 CFR 1.114 

2. A request for continued examination under 37 CFR 1.114, including the fee set 
forth in 37 CFR 1 .17(e), was filed in this application after final rejection. Since this 
application is eligible for continued examination under 37 CFR 1.114, and the fee set 
forth in 37 CFR 1 .17(e) has been timely paid, the finality of the previous Office action 
has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 18 May 
2010 has been entered. 

Information Disclosure Statement 

3. As required by M.P.E.P. 609(c), the applicant's submission of the Information 
Disclosure Statement, dated 18 May 2010, is acknowledged by the examiner and the 
cited references have been considered in the examination of the claims now pending. 
As required by M.P.E.P 609 C(2), a copy of the PTOL-1449 initialed and dated by the 
examiner is attached to the instant office action. 
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Specification 

4. The specification has not been checl<ed to the extent necessary to determine the 
presence of all possible minor errors. Applicant's cooperation is requested in correcting 
any errors of which applicant may become aware in the specification. 

Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

6. Claims 1, 3, 4, 8 and 10 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Dally (US 6,192,384) in view of Garde (US 6,510,510). 

As per claim 1 , Dally teaches a multi-issue processor comprising a register file as 
[a parallel processing computer system (column 1, lines 9-13) including a stream 
register file 14 (figure 1)]; and a plurality of issue slots as [ALU clusters 0-7 (numeral 
18, figure 1)], each one of the plurality of issue slots including a plurality of functional 
units as [each ALU cluster 18 includes a number of ALUs 26 (figures 2-3)], an input 
routing network that provides multiple data path outputs for a single data path input as 
[crosspoint switch 30 distributes the inputs to the ALUs 26, from a single input 
from the stream register file (SRF) (figures 2-3)], the input routing network receiving 
data from the register file on the single data path input via a single data input path and 
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providing data from the register file to functional units of the plurality of functional units, 
the data provided on the multiple data path outputs via multiple data output paths as 
[crosspoint switch 30 outputs the operands to the ALUs 26, from a single input 
from the stream register file (SRF) 14 (figures 1-3 and column 4, lines 38-58)], and 
a plurality of holdable registers that hold duplicate data from the register file, wherein in 
a first set of the plurality of issue slots the holdable registers store data on the multiple 
data output paths of the first set as [local register files 28 buffer the inputs to the 
ALUs 26 and store local constants, parameters and variables for the cluster, 
where the local register files 28 are fed by the crosspoint switch 30 (figures 1-3 
and column 4, lines 38-58)] and the holdable register do not store data on the single 
input path corresponding to the input routing networks of the first set [local register 
files 28 buffer the inputs to the ALUs 26, directly at the inputs, and store local 
constants, parameters and variables for the cluster, where the local register files 
28 are fed by the crosspoint switch 30 (figures 1-3 and column 4, lines 38-58)]. 

Dally does not explicitly teach a second set of the plurality of issue slots the 
holdable registers store data on the single data input path corresponding to the input 
routing networks of the second set and the holdable registers do not store data on the 
multiple data output paths of the second set. However, it has been held that 
rearranging the parts of an invention (i.e. moving the holdable registers from the output 
path to the input path of the routing network) involves only routine skill in the art. In re 
Japikse, 86 USPQ 70. This is further evidenced by the teachings of Garde, provided 
below. 
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Garde teaches a second set of the plurality of issue slots the holdable registers 
store data on the single data input path corresponding to the input routing networks of 
the second set as [operand latch 132 holds the output of the registers 130 on the 
input to the op busses 110 and 112, and multiplexers 162 and 168 supply inputs 
from bus 110, bus 112 or bus 114 to computation circuit 150 as operands A & B in 
response to a select A & B signals (column 6, lines 3-47 and figure 2)] and the 
holdable registers do not store data on the multiple data output paths of the second set 
as [operand latch 132 holds the output of the registers 130 on the input to the op 
busses 110 and 112, and multiplexers 162 and 168 supply inputs from bus 110, 
bus 112 or bus 114 to computation circuit 150 as operands A & B in response to a 
select A & B signals (column 6, lines 3-47 and figure 2), where the crosspoint 
switch 30 taught by Dally (see Dally, figure 3) serves as the input routing network. 
Therefore, the holdable registers of the combined issue slot(s) are on the single 
input path of the input routing network and not on the final output paths of the 
input routing network]. 

Dally and Garde are analogous art, as they are within the same field of endeavor, 
namely connecting a register file to multiple functional/computation units. 

It would have been obvious to one of ordinary skill in the art, at the time the 
invention was made, to use the operand latches for the register file output of Garde on 
the outputs of the stream register file/input to the crosspoint switches for some of the 
clusters taught by Dally. 



Application/Control Number: 1 0/511 ,51 2 Page 6 

Art Unit: 2183 

Because both Dally and Garde teach systems with a register file output 
connected to a series of inputs of a number of functional/computational units, and both 
including latches/registers on the path between the register file output and the functional 
unit inputs, it would have been obvious to one of ordinary skill in the art to use the 
operand latches for the register file output of Garde on the outputs of the stream register 
file/input to the crosspoint switches for some of the clusters taught by Dally, to achieve 
the predictable result of latching the output of the stream register file before it is sent via 
the crosspoint switches to the various ALUs. This also has the added advantage of 
decreasing the size/complexity of the hardware, if the number of holdable registers is 
decreased. 

As per claim 3, Dally teaches wherein the input routing network of each of the 
plurality of issue slot has a plurality of data path inputs as [the stream register file 
sends data to the ALU clusters via the crosspoint switches (figures 1-3) and 
crosspoint switch 30 outputs the operands to the ALUs 26, from a single input 
from the stream register file (SRF) 14 as well as an input from outputs of the other 
ALUs in the cluster (figures 1-3 and column 4, lines 38-58)]. 

Dally does not explicitly teach that in the second set of issue slots holdable 
registers of the plurality of holdable registers are located between each of the inputs of 
the input routing network and the register file. However, it has been held that 
rearranging the parts of an invention (i.e. moving the holdable registers from the output 
path to the input path of the routing network) involves only routine skill in the art. In re 
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Japikse, 86 USPQ 70. This is further evidenced by the teachings of Garde, provided 
below. 

Garde teaches the second set of issue slots' holdable registers of the plurality of 
holdable registers are located between each of the inputs of the input routing network 
and the register file as [operand latch 132 holds the output of the registers 130 on 
the input to the op busses 110 and 112, and multiplexers 162 and 168 supply 
inputs from bus 110, bus 112 or bus 114 to computation circuit 150 as operands A 
& B in response to a select A & B signals (column 6, lines Z-47 and figure 2)]. 

Dally and Garde are analogous art, as they are within the same field of endeavor, 
namely connecting a register file to multiple functional/computation units. 

It would have been obvious to one of ordinary skill in the art, at the time the 
invention was made, to use the operand latches for the register file output of Garde on 
the outputs of the stream register file/input to the crosspoint switches for some of the 
clusters taught by Dally. 

Because both Dally and Garde teach systems with a register file output 
connected to a series of inputs of a number of functional/computational units, and both 
including latches/registers on the path between the register file output and the functional 
unit inputs, it would have been obvious to one of ordinary skill in the art to use the 
operand latches for the register file output of Garde on the outputs of the stream register 
file/input to the crosspoint switches for some of the clusters taught by Dally, to achieve 
the predictable result of latching the output of the stream register file before it is sent via 
the crosspoint switches to the various ALUs. This also has the added advantage of 
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decreasing the size/complexity of the hardware, if the number of holdable registers is 
decreased. 

As per claim 4, Dally teaches wherein, in the first set of issue slots, holdable 
registers are located between the input routing networks and each of the plurality of 
function units as [local register files 28 buffer the inputs to the ALUs 26 and store 
local constants, parameters and variables for the cluster, where the local register 
files 28 are fed by the crosspoint switch 30 (figures 1-3 and column 4, lines 38- 
58)]. 

As per claim 8, see the rejection of claim 1 , above. 

As per claim 10, see the rejection of claim 4, above. 

7. Claims 2, 5-7, 9, 1 1 and 12 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Dally (US 6,192,384) in view of Garde (US 6,510,510), and further in 
view of Fisher (US 6,026,479). 

As per claim 2, Dally teaches the multi-issue processor of claim 1 , as described 

above. 

Dally does not teach a first instruction set accessing at least the first set of issue 
slots; and a second instruction set accessing the second set of issue slots, however. 
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Fisher teaches "a first instruction set accessing at least the first set of issue slots; 
and a second instruction set accessing the second set of issue slots" as ["A CPU 
having a cluster VLIW architecture... which operates in both a high instruction 
level parallelism (ILP) mode and a low ILP mode. In high ILP mode, the CPU 
executes wide instruction words using all operational clusters of the CPU and all 
of a main instruction cache and main data cache of the CPU are accessible to a 
high ILP task. The CPU also includes a mini-instruction cache, a mini-instruction 
register and a mini-data cache which are inactive during high ILP mode. An 
instruction level controller in the CPU receives a low ILP signal, such as an 
interrupt or function call to a low ILP routine, and switches to low ILP mode. In 
low ILP mode, the main instruction cache and main data cache are deactivated to 
preserve their contents. At the same time, a predetermined cluster remains active 
while the remaining clusters are also deactivated. The low ILP task executes 
instructions from the mini-instruction cache which are input to the predetermined 
cluster through the mini-instruction register. The mini-data cache stores 
operands for the low ILP task"(abstract, lines 1-19)]. 

Dally and Fisher are analogous art, as they are within the same field of endeavor, 
namely instruction processing. 

At the time of the invention it would have been obvious to a person of ordinary 
skill in the art to combine the parallel processor with a register file connected to a 
number of ALU clusters via crosspoint switches controlling the inputs of the functional 
units of each ALU in the clusters, taught by Dally, with the multiple instruction sets and 
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multiple groupings of resources for each, as taught Fisher, while using the latch 
locations taught by Garde for the separate resource cluster taught by Fisher. 

The motivation for doing so is provided by Fisher as ["the separate mini- 
instruction cache and mini-data cache along with the use of only the 
predetermined cluster minimizes the pollution of the main instruction and data 
caches, as well as pollution of register files in the deactivated clusters, with 
regard to a task executing in high ILP mode"(abstract, lines 20-24)]. 

As per claim 5, Dally teaches the multi-issue processor of claim 1 , as described 

above. 

Dally does not teach wherein the first set of issue slots are accessed by a first set 
of instructions for a VLIW processor and the second set of issue slots are accessed by 
a second set of instructions that are used by an interrupt routine, however. 

Fisher teaches wherein the first set of issue slots are accessed by a first set of 
instructions for a VLIW processor and the second set of issue slots are accessed by a 
second set of instructions that are used by an interrupt routine as ["A CPU having a 
cluster VLIW architecture. ..which operates in both a high instruction level 
parallelism (ILP) mode and a low ILP mode. In high ILP mode, the CPU executes 
wide instruction words using all operational clusters of the CPU and all of a main 
instruction cache and main data cache of the CPU are accessible to a high ILP 
task. The CPU also includes a mini-instruction cache, a mini-instruction register 
and a mini-data cache which are inactive during high ILP mode. An instruction 
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level controller in the CPU receives a low ILP signal, such as an interrupt or 
function call to a low ILP routine, and switches to low ILP mode. In low ILP mode, 
the main instruction cache and main data cache are deactivated to preserve their 
contents. At the same time, a predetermined cluster remains active while the 
remaining clusters are also deactivated. The low ILP task executes instructions 
from the mini-instruction cache which are input to the predetermined cluster 
through the mini-instruction register. The mini-data cache stores operands for the 
low ILP task" (abstract, lines 1-19)]. 

Dally and Fisher are analogous art, as they are within the same field of endeavor, 
namely instruction processing. 

At the time of the invention it would have been obvious to a person of ordinary 
skill in the art to combine the parallel processor with a register file connected to a 
number of ALU clusters via crosspoint switches controlling the inputs of the functional 
units of each ALU in the clusters, taught by Dally, with the multiple instruction sets and 
multiple groupings of resources for each, as taught Fisher, while using the latch 
locations taught by Garde for the separate resource cluster taught by Fisher. 

The motivation for doing so is provided by Fisher as ["the separate mini- 
instruction cache and mini-data cache along with the use of only the 
predetermined cluster minimizes the pollution of the main instruction and data 
caches, as well as pollution of register files in the deactivated clusters, with 
regard to a task executing in high ILP mode"(abstract, lines 20-24)]. 
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As per claim 6, Fisher teaches wherein the second set of instructions has fewer 
instructions than the first set of instructions as [An embodiment of a method for 
reducing cache pollution in a CPU, according to the present invention, includes 
providing a main instruction cache configured to store VLIW instructions, 
wherein each VLIW Instruction is further comprised of a plurality of c- 
instructions, providing a plurality of operational clusters, wherein each one of the 
plurality of operational clusters is configured to receive one of the plurality of c- 
instructions of each VLIW instruction in the main instruction cache, and 
executing a high ILP task by loading VLIW instructions from the main instruction 
cache into a main instruction register for output to the plurality of clusters. The 
method includes receiving a low ILP signal and, responsive thereto, deactivating 
the main instruction cache and main instruction register, deactivating the 
plurality of operational clusters, except for a predetermined one of the 
operational clusters, activating a mini-instruction cache and a mini-instruction 
register, and serially executing a low ILP task by serially loading c-instructions 
from the mini-instruction cache into the mini-instruction cache for output to the 
predetermined one of the operational clusters (column 4, lines 16-34)]. 

As per claim 7, Dally teaches the multi-issue processor of claim 1 , as described 

above. 

Dally does not explicitly teach wherein the first set of issue slots has more issue 
slots than the second set of issue slots, however. 
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Fisher teaches wherein the first set of issue slots has more issue slots than the 
second set of issue slots as ["The CPU also includes a mini-instruction cache, a 
mini-instruction register and a mini-data cache which are inactive during high ILP 
mode. An instruction level controller in the CPU receives a low ILP signal, such 
as an interrupt or function call to a low ILP routine, and switches to low ILP mode. 
In low ILP mode, the main instruction cache and main data cache are deactivated 
to preserve their contents. At the same time, a predetermined cluster remains 
active while the remaining clusters are also deactivated. The low ILP task 
executes instructions from the mini-instruction cache which are input to the 
predetermined cluster through the mini-instruction register. The mini-data cache 
stores operands for the low ILP task"(abstract, lines 6-19) wherein deactivating 
some clusters means the cluster running the low ILP tasks (the second set of 
issue slots) is smaller than the first]. 

Dally and Fisher are analogous art, as they are within the same field of endeavor, 
namely instruction processing. 

At the time of the invention it would have been obvious to a person of ordinary 
skill in the art to combine the parallel processor with a register file connected to a 
number of ALU clusters via crosspoint switches controlling the inputs of the functional 
units of each ALU in the clusters, taught by Dally, with the multiple instruction sets and 
multiple groupings of resources for each, as taught Fisher, while using the latch 
locations taught by Garde for the separate resource cluster taught by Fisher. 
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The motivation for doing so is provided by Fisher as ["the separate mini- 
instruction cache and mini-data cache along with the use of only the 
predetermined cluster minimizes the pollution of the main instruction and data 
caches, as well as pollution of register files in the deactivated clusters, with 
regard to a task executing in high ILP mode"(abstract, lines 20-24)]. 

As per claim 9, see the rejection of claim 2, above. 

As per claim 1 1 , see the rejection of claim 6, above. 

As per claim 12, see the rejection of claim 7, above. 

Response to Arguments 

8. Applicant's arguments filed 18 May 2010 have been fully considered but they are 
not persuasive. 

9. Applicant argues that the cited art does not teach a second set of issue slots that 
have holdable registers on the single data input path of the input routing network and do 
not have holdable registers on the multiple data output paths of the input routing 
network. 

However, Garde teaches operand latch 132 holds the output of the registers 130 
on the input to the op busses 1 10 and 112, and multiplexers 162 and 168 supply inputs 
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from bus 110, bus 1 12 or bus 1 14 to computation circuit 150 as operands A & B in 
response to a select A & B signals (column 6, lines 3-47 and figure 2). The 
multiplexers, used for switching, are a part of the input routing network taught by Garde, 
and there are no holdable registers on the output paths of the multiplexers, or on the 
inputs of the computation circuits. In the combination of Dally and Garde, provided 
above, the crosspoint switch 30 taught by Dally (see Dally, figure 3) serves as the input 
routing network. Therefore, the holdable registers of the combined issue slot(s) are on 
the single input path of the input routing network and not on the final output paths of the 
input routing network. 

10. In response to applicant's argument that there is no teaching, suggestion, or 
motivation to combine the references, the examiner recognizes that obviousness may 
be established by combining or modifying the teachings of the prior art to produce the 
claimed invention where there is some teaching, suggestion, or motivation to do so 
found either in the references themselves or in the knowledge generally available to one 
of ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 
1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR 
International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007). 

In this case, because both Dally and Garde teach systems including 
latches/registers between the output of the register file and the inputs of the multiple 
functional units/computation circuits, while Dally teaches placing the registers at the 
multiple output paths of the input routing network (i.e., crosspoint switch) and Garde 
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teaches placing tlie latclies on the input of the input routing network (i.e., series of op 
busses and nnultiplexers), it would have been obvious to one of ordinary skill to 
substitute the register/latch placement of Dally with that taught by Garde, in at least 
some of the issue slots taught by Dally, to achieve the predictable result of latching the 
output of the stream register file before it is sent via the crosspoint switches to the 
various ALUs. This also has the added advantage of decreasing the size/complexity of 
the hardware, if the number of holdable registers is decreased. 

1 1 . Applicant further argues that Dally teaches away from the modification with 
Garde, as the local registers at the inputs of the ALUs taught by Dally are used to 
provide a tiered storage architecture. 

However, as shown above. Garde also provides for operand registers/latches. 
The combination does not remove the holdable registers in these issue slots, but rather 
changes the placement of the registers in relation to the register file/routing 
network/functional units. Thus the Dally reference does not teach away from such a 
combination, as a "tiered storage architecture" is still provided. Furthermore, even if the 
number of registers in this tier were decreased (which is not required by the 
combination), this would not change the principal of operation of the Dally reference, 
instead providing a tradeoff between the complexity/size of hardware and increased 
performance due to extra register availability. 
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Conclusion 

1 2. The following is a summary of the treatment and status of all claims in the 
application as recommended by M.P.E.P. 707.07(i): claims 1-12 are rejected. 

1 3. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

a. Balmer (US 2002/01 08026) - discloses holdable registers on register file 
outputs for transferring data between datapaths. 

b. Hao (4,594,655) - discloses staging registers on the outputs of the 
register file for holding operands to be sent to the ALUs. 

14. The examiner requests, in response to this Office action, that support be shown 
for language added to any original claims on amendment and any new claims. That is, 
indicate support for newly added claim language by specifically pointing to page(s) and 
line number(s) in the specification and/or drawing figure(s). This will assist the examiner 
in prosecuting the application. 

1 5. When responding to this office action. Applicant is advised to clearly point out the 
patentable novelty which he or she thinks the claims present, in view of the state of the 
art disclosed by the references cited or the objections made. He or she must also show 
how the amendments avoid such references or objections. See 37 CFR 1.111 (c). 
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Any inquiry concerning this communication or earlier communications from tine 
examiner should be directed to GEORGE D. GIROUX whose telephone number is 
(571)272-9769. The examiner can normally be reached on Monday through Friday, 
9:30am - 6:00pm E.S.T. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Eddie P. Chan can be reached on 571-272-4162. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Eddie P Chan/ /George D Giroux/ 

Supervisory Patent Examiner, Art Unit 21 83 Examiner, Art Unit 21 83 



